22:10
2025-12-10
rosmine.ai
large-language-models
A new test for if your LLM is subtly manipulating you
A method using Contrastive Decoding to detect when a large language model (LLM) is subtly suppressing information, such as avoiding mentions of a competitor's product. The author trained a "Manipulatoβ¦